We propose to know the impact of COVID-19 tackling infodemics and misinformation on Twitter. This is done by extracting recent popular tweets from a specific location across different countries. It will help us describe the false information that is spread with the sole purpose of causing confusion and harm. We target to extract hashtags like #covid19, #misinformation, #fakenews, #disinformation, #, etc., to get the related posts about it and analyze how the information processing and decision-making behaviors are compromised. We perform sentimental analysis on the tweets to understand the sentiments of people which is crucial during the time of this pandemic
We have primarily two datasets - one of them contains tweets from the onset of the pandemic and the other are very recent tweets (June 2021). Our main objective here is to figure out how the sentiments have changed over the months.
For the security purposes, we show the skeletal code to extract the tweets using fake credentials. We would load the data via .rds file for our extracted tweets. (Rul, n.d.)
library(rtweet)
library(dplyr)
library(tidyr)
library(twitteR)
library(tidytext)
appname <- "CovidDistress"
key <- "ogRXvxribQAEt9tJKQ1rEd0c0"
secret <- "HlvVRoFg73JJcpcGjYxUWBagWratEIrdagPCeaiToWTKa15vCO"
access_token <- "15914217-8YYyRRAxRBL0Vu9Y0tAjVFfPvdJdYByfmsiVpLEoD"
access_secret <- "oeXIkYHBTQpGRxZCKI4q67UN3L8PuJfwb0su6EOkIk22f"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret,
access_token = access_token,
access_secret = access_secret,
set_renv = TRUE)
corona_tweets <- search_tweets(q = "#covid19 OR #coronavirus", n=20000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
saveRDS(corona_tweets, "../data/tweets2021.rds")
We can now load saved RDS file using the command below
tweets2021_raw <- readRDS("../data/tweets2021.rds")
There are 35725 tweets from the dataset which is more than what we intended. This is because we set retryonratelimit to TRUE. These tweets are dated from June 17 2021 to June 19, 2021
Here’s a sample row from the dataset
| user_id | status_id | created_at | screen_name | text | source | display_text_width | reply_to_status_id | reply_to_user_id | reply_to_screen_name | is_quote | is_retweet | favorite_count | retweet_count | quote_count | reply_count | hashtags | symbols | urls_url | urls_t.co | urls_expanded_url | media_url | media_t.co | media_expanded_url | media_type | ext_media_url | ext_media_t.co | ext_media_expanded_url | ext_media_type | mentions_user_id | mentions_screen_name | lang | quoted_status_id | quoted_text | quoted_created_at | quoted_source | quoted_favorite_count | quoted_retweet_count | quoted_user_id | quoted_screen_name | quoted_name | quoted_followers_count | quoted_friends_count | quoted_statuses_count | quoted_location | quoted_description | quoted_verified | retweet_status_id | retweet_text | retweet_created_at | retweet_source | retweet_favorite_count | retweet_retweet_count | retweet_user_id | retweet_screen_name | retweet_name | retweet_followers_count | retweet_friends_count | retweet_statuses_count | retweet_location | retweet_description | retweet_verified | place_url | place_name | place_full_name | place_type | country | country_code | geo_coords | coords_coords | bbox_coords | status_url | name | location | description | url | protected | followers_count | friends_count | listed_count | statuses_count | favourites_count | account_created_at | verified | profile_url | profile_expanded_url | account_lang | profile_banner_url | profile_background_url | profile_image_url |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x4233818847 | x1406271649062830080 | 2021-06-19 15:24:23 | Vickeysclick | No #VaccinationDrive at #Namakkal on #Sunday. @namakkal09 @Namakkalpolice #COVID19 | Twitter for Android | 83 | FALSE | FALSE | 0 | 0 | NA | NA | VaccinationDrive Namakkal Sunday COVID19 | NA | x1246397788293742593 x1113056726931009536 | namakkal09 Namakkalpolice | en | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA NA | NA NA | NA NA NA NA NA NA NA NA | https://twitter.com/Vickeysclick/status/1406271649062830080 | Vignesh Vijayakumar | Daydreamer, Journalist, fact-checker... Tweets are personal& RT's aren't endorsements | FALSE | 183 | 538 | 0 | 702 | 10241 | 2015-11-20 10:46:08 | FALSE | NA | http://abs.twimg.com/images/themes/theme1/bg.png | http://pbs.twimg.com/profile_images/667662165390835712/ZmO8TbTB_normal.jpg |
We also have few other datasets that has tweets from 2020 and with other hashtags
tweets2021_vaccine<- search_tweets(q = "#vaccine", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_vaccine_and_covid19<- search_tweets(q = "#covid19 AND #vaccine", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_job <- search_tweets(q = "#job", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_job_covid19 <- search_tweets(q = "#covid19 AND #job", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_jobloss <- search_tweets(q = "#covid19 AND #jobloss", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
tweets2021_donate <- search_tweets(q = "#covid19 AND #donate", n=10000, include_rts=FALSE, lang="en", retryonratelimit = TRUE)
To be added 1. English Word Cloud (Old Vs New) 2. Frequency Chart (Old Vs New) 3. Postive and Negative Common Words (Old Vs New) 4. Sentiment Analysis Bar graph (Old Vs New) 5. A world map for seeing the tweets world wide (Old Vs New If Possible) 6. German Word Cloud 7. Sentiment Regarding Vaccine Word Cloud/Bar Graph 8. Prefferred Vaccine Word Cloud/Bar Graph 9. Mental Health Word Cloud/Bar Graph
To explore the data and extract insights in the most efficient way, we decided to clean up the data. We use only the relevant columns
## [1] "user_id" "status_id"
## [3] "created_at" "screen_name"
## [5] "text" "source"
## [7] "display_text_width" "reply_to_status_id"
## [9] "reply_to_user_id" "reply_to_screen_name"
## [11] "is_quote" "is_retweet"
## [13] "favorite_count" "retweet_count"
## [15] "quote_count" "reply_count"
## [17] "hashtags" "symbols"
## [19] "urls_url" "urls_t.co"
## [21] "urls_expanded_url" "media_url"
## [23] "media_t.co" "media_expanded_url"
## [25] "media_type" "ext_media_url"
## [27] "ext_media_t.co" "ext_media_expanded_url"
## [29] "ext_media_type" "mentions_user_id"
## [31] "mentions_screen_name" "lang"
## [33] "quoted_status_id" "quoted_text"
## [35] "quoted_created_at" "quoted_source"
## [37] "quoted_favorite_count" "quoted_retweet_count"
## [39] "quoted_user_id" "quoted_screen_name"
## [41] "quoted_name" "quoted_followers_count"
## [43] "quoted_friends_count" "quoted_statuses_count"
## [45] "quoted_location" "quoted_description"
## [47] "quoted_verified" "retweet_status_id"
## [49] "retweet_text" "retweet_created_at"
## [51] "retweet_source" "retweet_favorite_count"
## [53] "retweet_retweet_count" "retweet_user_id"
## [55] "retweet_screen_name" "retweet_name"
## [57] "retweet_followers_count" "retweet_friends_count"
## [59] "retweet_statuses_count" "retweet_location"
## [61] "retweet_description" "retweet_verified"
## [63] "place_url" "place_name"
## [65] "place_full_name" "place_type"
## [67] "country" "country_code"
## [69] "geo_coords" "coords_coords"
## [71] "bbox_coords" "status_url"
## [73] "name" "location"
## [75] "description" "url"
## [77] "protected" "followers_count"
## [79] "friends_count" "listed_count"
## [81] "statuses_count" "favourites_count"
## [83] "account_created_at" "verified"
## [85] "profile_url" "profile_expanded_url"
## [87] "account_lang" "profile_banner_url"
## [89] "profile_background_url" "profile_image_url"
For more powerful insights, we use only the columns “text,” “hashtags” and “location” and we speciafically clean up the columns text and hashtags. Let’s do some basic analysis to see the top locations of tweets.
tweets2021_raw %>%
filter(!is.na(location) & location != "") %>%
count(location, sort = TRUE) %>%
top_n(10)
It is however important to note that Twitter API is based on relevance and not completedness https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/overview
#install.packages("devtools")
#devtools::install_github("hadley/emo")
library(emo)
tweets2021_raw %>%
mutate(emoji = ji_extract_all(text)) %>%
unnest(cols = c(emoji)) %>%
count(emoji, sort = TRUE) %>%
top_n(10)